Set baseflow separation (for the Lyne-Hollick one-parameter digital recursive filter) and event delineation paramters (as in Wasko and Guo, 2022)
alp: alpha filter parameter, higher values “lower” the estimated baseflow (thus making it difficult to delineate events)
numpass: number of passes. Ladson et al. (2013) recommend 3 passes for daily data (default in baseflowB() function)
thresh: baseflow index threshold for event delineation, higher threshold values make it “easier” to delineate events
Code
alp <-0.925numpass <-3thresh <-0.75
9.2.3 Baseflow separation
Perform baseflow separation. See Wasko and Guo (2022, Hydrological Processes) for recommendations on parameterization, as different algorithms and alpha values can produce different results. Section 4.1: “…users should not simply use recommended filter parameter values from literature in combination with any baseflow filter code without verification of their choice of filter parameter. As the digital baseflow filter is not tied to any physical realism (Nathan and McMahon, 1990) and a larger fractional baseflow may aid identification of events—even if this is not strictly baseflow as per its definition (Linsley et al., 1958)…”.
It is not actually necessary to do this separately, as baseflow separation is conducted in the event delineation function internally. But it is helpful to view the results and explore how different parameterizations yield different baseflow contributions.
Total daily streamflow in yield (mm), baseflow in yield, and fractional baseflow index over the period of record for site West Brook NWIS
9.2.4 Event identification
There are multiple options for methods of event identification. Section 4.2 in Wasko and Guo (2022): It can be generalized that, if the aim of the analysis is to identify independent streamflow maxima then eventMaxima() and eventMinima() work well and indeed have been traditionally employed for this task. If identifying independent small events becomes difficult, or the aim is to identify wet spells, eventBaseflow() may be preferred.” In our case, small events are likely important, so we use eventBaseflow() with a ~large values for BFI_Th to capture those small events.
The aim is to understand how water availability affects differences in yield between Big G and Little g during hydrologic events and the intervening periods (non-event baseflow periods). Therefore, we identify events from the Big G data, apply event/non-event time periods to the little g data, then explore G-g deviations during both events and non-events as a function of
Time series of hydrologic events at Big G, identified using eventBaseflow().
9.2.4.3 Check congruency
Applying Big G event/non-event periods to little g time series data inherently assumes that event/non-event periods would be similarly delineated for little g. If this assumption does not hold, then non-event little g flow would be included in event periods, and vice-versa. How well does this assumption hold?
Code
sites <-c("West Brook Lower", "Mitchell Brook", "Jimmy Brook", "Obear Brook Lower", "West Brook Upper", "West Brook Reservoir", "Sanderson Brook", "Avery Brook", "West Whately Brook")dat_little2 <- dat_little %>%filter(site_name =="Jimmy Brook")events_little <-eventBaseflow(dat_little2$Yield_filled_mm, BFI_Th =0.75)# define positions of non-eventssrt <-c(1)end <-c(events_little$srt[1]-1)for (i in2:(dim(events_little)[1])) { srt[i] <- events_little$end[i-1]+1 end[i] <- events_little$srt[i]-1}nonevents <-data.frame(tibble(srt, end) %>%mutate(len = end - srt) %>%filter(len >=0) %>%select(-len) %>%add_row(srt = events_little$end[dim(events_little)[1]]+1, end =dim(dat_little2)[1]))# create vectors of binary event/non-event and event IDsisevent_vec <-rep(2, times =dim(dat_little2)[1])eventid_vec <-rep(NA, times =dim(dat_little2)[1])for (i in1:dim(events_little)[1]) { isevent_vec[c(events_little[i,1]:events_little[i,2])] <-1 eventid_vec[c(events_little[i,1]:events_little[i,2])] <- i}# create vector of non-event IDsnoneventid_vec <-rep(NA, times =dim(dat_little2)[1])for (i in1:dim(nonevents)[1]) { noneventid_vec[c(nonevents[i,1]:nonevents[i,2])] <- i }# create vector of "agnostic events": combined hydro events and non-eventsagnevents <-rbind(events_little %>%select(srt, end) %>%mutate(event =1), nonevents %>%mutate(event =0)) %>%arrange((srt))agneventid_vec <-c()for (i in1:dim(agnevents)[1]){ agneventid_vec[c(agnevents[i,1]:agnevents[i,2])] <- i }# add event/non-event vectors to Big G datadat_little2 <- dat_little2 %>%mutate(isevent = isevent_vec, eventid = eventid_vec,noneventid = noneventid_vec,agneventid = agneventid_vec,little_event_yield =ifelse(isevent_vec ==1, Yield_filled_mm, NA),little_nonevent_yield =ifelse(isevent_vec ==2, Yield_filled_mm, NA),little_event_quick = little_event_yield - bf) %>%rename(little_yield = Yield_filled_mm, little_bf = bf, little_bfi = bfi)dat_big %>%select(date, big_event_yield) %>%left_join(dat_little2 %>%select(date, little_event_yield)) %>%dygraph() %>%dyRangeSelector() %>%dyAxis("y", label ="Daily yield (mm)") %>%dyOptions(fillGraph =TRUE, drawGrid =FALSE, axisLineWidth =1.5)
Time series of yield for Big G (West Brook NWIS) and one little g site (Jimmy Brook) during hydrologic events as delineated for Big G and little g, respectively.
Whether or not events alighn between G and g is highly variable. In some cases, g events begin/end prior to G events, and in other cases g events begin/end later G events. In some cases g events are shorter than G events, and in other cases they are longer. In many cases, events are perfectly matched. Importantly, peaks in yield are almost always synchronous.
Ultimately, does this matter given that we are simply using this as a method to break up our data? Furthermore, the framing of the ~entire project is that Big G is the reference by which to compare all little g’s. In this sense, applying event/non-event periods derived from G to g matches this persepctive.
9.2.5 Join events to Little g
Code
# fudge factor to deal with 0 flow, when loggedfudge <-0.01# join big g events to little g and summarize by event (cumulative, mean, and quantiles)dat_wb2 <- dat_little %>%filter(date >=min(dat_big$date) & date <=max(dat_big$date)) %>%left_join(dat_big %>%select(-site_name)) %>%group_by(site_name, basin, subbasin, agneventid) %>%summarise(eventlen =n(),mindate =min(date),isevent =unique(isevent), # cumulativeyield_little_cumul =sum(Yield_filled_mm+fudge),yield_big_cumul =sum(big_yield+fudge),yield_little_cumul_log =log(yield_little_cumul),yield_big_cumul_log =log(yield_big_cumul),# meanyield_little_mean =mean(Yield_filled_mm+fudge),yield_big_mean =mean(big_yield+fudge),yield_little_mean_log =mean(log(Yield_filled_mm+fudge)),yield_big_mean_log =mean(log(big_yield+fudge)),# quantilesyield_little_q10_log =quantile(log(Yield_filled_mm+fudge), probs =0.10),yield_little_q50_log =quantile(log(Yield_filled_mm+fudge), probs =0.50),yield_little_q90_log =quantile(log(Yield_filled_mm+fudge), probs =0.90),yield_big_q10_log =quantile(log(big_yield+fudge), probs =0.10),yield_big_q50_log =quantile(log(big_yield+fudge), probs =0.50),yield_big_q90_log =quantile(log(big_yield+fudge), probs =0.90)) %>%ungroup() %>%mutate(site_name =factor(site_name, levels =c("West Brook Lower", "Mitchell Brook", "Jimmy Brook", "Obear Brook Lower", "West Brook Upper", "West Brook Reservoir", "Sanderson Brook", "Avery Brook", "West Whately Brook")),site_name_cd =as.numeric(site_name)# z_yield_big_cumul_log = as.numeric(scale(yield_big_cum_log, center = TRUE, scale = TRUE)),# z_yield_big_mean_log = as.numeric(scale(yield_big_mean_log, center = TRUE, scale = TRUE)) )dat_wb2
# A tibble: 708 × 22
site_name basin subbasin agneventid eventlen mindate isevent
<fct> <chr> <chr> <int> <int> <date> <dbl>
1 Avery Brook West Brook West Brook 5 6 2020-02-20 2
2 Avery Brook West Brook West Brook 6 6 2020-02-26 1
3 Avery Brook West Brook West Brook 7 4 2020-03-03 1
4 Avery Brook West Brook West Brook 8 5 2020-03-07 2
5 Avery Brook West Brook West Brook 9 4 2020-03-12 1
6 Avery Brook West Brook West Brook 10 2 2020-03-16 2
7 Avery Brook West Brook West Brook 11 4 2020-03-18 1
8 Avery Brook West Brook West Brook 12 4 2020-03-22 2
9 Avery Brook West Brook West Brook 13 11 2020-03-26 1
10 Avery Brook West Brook West Brook 14 3 2020-04-06 2
# ℹ 698 more rows
# ℹ 15 more variables: yield_little_cumul <dbl>, yield_big_cumul <dbl>,
# yield_little_cumul_log <dbl>, yield_big_cumul_log <dbl>,
# yield_little_mean <dbl>, yield_big_mean <dbl>, yield_little_mean_log <dbl>,
# yield_big_mean_log <dbl>, yield_little_q10_log <dbl>,
# yield_little_q50_log <dbl>, yield_little_q90_log <dbl>,
# yield_big_q10_log <dbl>, yield_big_q50_log <dbl>, …
Code
# write to filewrite_csv(dat_wb2, "C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/EcoDrought-Analysis/Event Delineation/Basin files/EcoDrought_Data_EventNonEvent_WestBrookonly.csv")
View pairs plot of summary statistics, for Big G only (limit to one little g site to avoid pseudo replication):
p1 <- dat_wb2 %>%ggplot(aes(x = yield_big_cumul_log, y = yield_little_cumul_log, group = site_name, color = site_name)) +geom_point(alpha =0.25) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F) +theme(legend.position="none")p2 <- dat_wb2 %>%ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = site_name, color = site_name)) +geom_point(alpha =0.25) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F) +theme(legend.position="none")p3 <- dat_wb2 %>%ggplot(aes(x = yield_big_q10_log, y = yield_little_q10_log, group = site_name, color = site_name)) +geom_point(alpha =0.25) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F) +theme(legend.position="none")p4 <- dat_wb2 %>%ggplot(aes(x = yield_big_q50_log, y = yield_little_q50_log, group = site_name, color = site_name)) +geom_point(alpha =0.25) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F) +theme(legend.position="none")p5 <- dat_wb2 %>%ggplot(aes(x = yield_big_q90_log, y = yield_little_q90_log, group = site_name, color = site_name)) +geom_point(alpha =0.25) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F) +theme(legend.position="none")ggarrange(p1, p2, p3, p4, p5, nrow =2, ncol =3)
Relationship between yield at Big G and yield at little g, summarized during event/non-event periods with five different metrics.
View relationship between Big G and little g, color by event/non-event, facet by site. For most sites (except may Obear Brook), G-g relationships are identical between events and non-event.
dat_wb2 %>%mutate(year =as.factor(year(mindate))) %>%ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = year, color = year)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)
9.2.6.1 To log or not to log?
Does using log-transformed data fundamentally change nature of the relationship? What does the g~G relationship look like if we use data on the original scale? Means, for example:
Code
dat_wb2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_mean, y = yield_little_mean, group = site_name, color = site_name)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F)
9.3 Spread Creek
9.3.1 Trim to focal sites
Code
dat_sp <- dat %>%filter(basin =="Snake River", site_name !="Pacific Creek at Moran NWIS") %>%mutate(site_name =recode(site_name, "Leidy Creek Mouth NWIS"="Leidy Creek Mouth", "SF Spread Creek Lower NWIS"="SF Spread Creek Lower")) %>%group_by(site_name, date) %>%summarise(Yield_filled_mm =mean(Yield_filled_mm), flow_mean_filled_cms =mean(flow_mean_filled_cms), area_sqkm =mean(area_sqkm), WaterYear =unique(WaterYear)) %>%ungroup()unique(dat_sp$site_name)
[1] "Grizzly Creek" "Grouse Creek" "Leidy Creek Mouth"
[4] "Leidy Creek Upper" "NF Spread Creek Lower" "NF Spread Creek Upper"
[7] "Rock Creek" "SF Spread Creek Lower" "SF Spread Creek Upper"
[10] "Spread Creek Dam"
Set baseflow separation (for the Lyne-Hollick one-parameter digital recursive filter) and event delineation paramters (as in Wasko and Guo, 2022)
alp: alpha filter parameter, higher values “lower” the estimated baseflow (thus making it difficult to delineate events)
numpass: number of passes. Ladson et al. (2013) recommend 3 passes for daily data (default in baseflowB() function)
thresh: baseflow index threshold for event delineation, higher threshold values make it “easier” to delineate events
Code
alp <-0.925numpass <-3thresh <-0.85
9.3.3 Baseflow separation
Perform baseflow separation. See Wasko and Guo (2022, Hydrological Processes) for recommendations on parameterization, as different algorithms and alpha values can produce different results. Section 4.1: “…users should not simply use recommended filter parameter values from literature in combination with any baseflow filter code without verification of their choice of filter parameter. As the digital baseflow filter is not tied to any physical realism (Nathan and McMahon, 1990) and a larger fractional baseflow may aid identification of events—even if this is not strictly baseflow as per its definition (Linsley et al., 1958)…”.
It is not actually necessary to do this separately, as baseflow separation is conducted in the event delineation function internally. But it is helpful to view the results and explore how different parameterizations yield different baseflow contributions.
p1 <- dat_sp2 %>%ggplot(aes(x = yield_big_cumul_log, y = yield_little_cumul_log, group = site_name, color = site_name)) +geom_point(alpha =0.25) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F) +theme(legend.position="none")p2 <- dat_sp2 %>%ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = site_name, color = site_name)) +geom_point(alpha =0.25) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F) +theme(legend.position="none")p3 <- dat_sp2 %>%ggplot(aes(x = yield_big_q10_log, y = yield_little_q10_log, group = site_name, color = site_name)) +geom_point(alpha =0.25) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F) +theme(legend.position="none")p4 <- dat_sp2 %>%ggplot(aes(x = yield_big_q50_log, y = yield_little_q50_log, group = site_name, color = site_name)) +geom_point(alpha =0.25) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F) +theme(legend.position="none")p5 <- dat_sp2 %>%ggplot(aes(x = yield_big_q90_log, y = yield_little_q90_log, group = site_name, color = site_name)) +geom_point(alpha =0.25) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F) +theme(legend.position="none")ggarrange(p1, p2, p3, p4, p5, nrow =2, ncol =3)
Relationship between yield at Big G and yield at little g, summarized during event/non-event periods with five different metrics.
View relationship between Big G and little g, color by event/non-event, facet by site. For most sites (except may Obear Brook), G-g relationships are identical between events and non-event.
dat_sp2 %>%mutate(year =as.factor(year(mindate))) %>%ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = year, color = year)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)
9.3.6.1 To log or not to log?
Does using log-transformed data fundamentally change nature of the relationship? What does the g~G relationship look like if we use data on the original scale? Means, for example:
Code
dat_sp2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_mean, y = yield_little_mean, group = site_name, color = site_name)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F)
9.4 DEPRECATED
9.5 Sensitivity analyses
9.5.1 Missing data
Many of the EcoDrought time series data are incomplete. At some sites, discharge data is available only during the summer and/or fall periods, and at other sites, time series data are interrupted due to malfunctioning sensors and/or ice formation (“ice spikes”). So how does the length of the time series affect baseflow separation (and subsequent event identification)? Wasko and Guo (2022) use a 67 day time series of flow to demonstrate the utility of the hydroEvents packages, suggesting digital baseflow separation techniques may be valid for relatively short time series.
Here, I perform a simple sensitivity analysis to explore the effect of time series length on the results of baseflow separation. Essentially, perform baseflow separation on increasingly smaller subsets of the data. With the default parameters, the minimum number of days/observations needed is 31. This is because the default number of points reflected at start and end of data (r) is 30. Reflection allows bf/bfi to be calculated over the entire period of record as the underlying baseflow separation equations result in “issues of”Warm-up” and “cool-down” as the recursive filter is moved forward and backward over the dataset” (Ladson et al. 2013, Australian Journal of Water Resources). baseflowB() uses a default reflection period of 30, which Ladson et al. (2013) found to “provide a realistic baselfow response for the start and end of the actual flow data”.
9.5.1.1 Compare baseflow
Divergence in baseflow among datasets is a result of the reflected data of the shorter dataset not matching the actual data of the longer dataset. As a result, divergence really only occurs at the end of each time series and is generally small in magnitude.
9.5.1.2 Compare baseflow index
The story here is essentially the same as above: divergence is ~minimal and restricted to the end of each time series. However, we note that divergence in BFI appears to increase as absolute flow/baseflow decreases, because small differences in absolute space become much larger in relative space when absolute values are small.
View relationship between Big G and among-site/event-specific standard deviation in little g.
---title: "Hydro Event Delineation"---Purpose: Conduct baseflow separation and delineate hydrologic events to model in Gg framework```{r echo=FALSE, message=FALSE}library(tidyverse)library(sf)library(mapview)library(knitr)library(fasstr)library(RColorBrewer)library(scales)library(dygraphs)library(hydroEvents)library(GGally)library(dataRetrieval)library(ggpubr)```## Data### Site info and daily data```{r}# site information and locationssiteinfo <-read_csv("C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/Data/EcoDrought_SiteInformation.csv")siteinfo_sp <-st_as_sf(siteinfo, coords =c("long", "lat"), crs =4326)mapview(siteinfo_sp, zcol ="designation")# flow/yield (and temp) data dat <-read_csv("C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/Data/EcoDrought_FlowTempData_DailyWeekly.csv") %>%filter(!site_name %in%c("WoundedBuckCreek", "Brackett Creek"))# add water/climate year variablesdat <-add_date_variables(dat, dates = date, water_year_start =4)str(dat)```## West Brook### Trim to focal sites```{r}dat_wb <- dat %>%filter(site_name %in%c("West Brook NWIS", "West Brook Lower", "Mitchell Brook", "Jimmy Brook", "Obear Brook Lower", "West Brook Upper", "West Brook Reservoir", "Sanderson Brook", "Avery Brook", "West Whately Brook"))```View daily yield at Big G```{r}#| fig-cap: "Total daily streamflow in yield (mm), over the period of record"dat_wb %>%filter(site_name =="West Brook NWIS") %>%select(date, Yield_filled_mm) %>%dygraph() %>%dyRangeSelector() %>%dyAxis("y", label ="Daily yield (mm)") %>%dyOptions(fillGraph =TRUE, drawGrid =FALSE, axisLineWidth =1.5)```View daily yield at all sites```{r}dat_wb %>%filter(site_name !="West Brook NWIS") %>%select(date, site_name, Yield_filled_mm) %>%spread(key = site_name, value = Yield_filled_mm) %>%left_join(dat_wb %>%filter(site_name =="West Brook NWIS") %>%select(date, Yield_filled_mm) %>%rename(West_Brook_NWIS = Yield_filled_mm)) %>%dygraph() %>%dyRangeSelector() %>%dyAxis("y", label ="Yield (mm)") %>%dyOptions(colors =c(hcl.colors(9, "Zissou 1"), "black")) ```### Set parametersSet baseflow separation (for the Lyne-Hollick one-parameter digital recursive filter) and event delineation paramters (as in Wasko and Guo, 2022)* *alp*: alpha filter parameter, higher values "lower" the estimated baseflow (thus making it difficult to delineate events)* *numpass*: number of passes. Ladson *et al*. (2013) recommend 3 passes for daily data (default in baseflowB() function)* *thresh*: baseflow index threshold for event delineation, higher threshold values make it "easier" to delineate events```{r}alp <-0.925numpass <-3thresh <-0.75```### Baseflow separationPerform baseflow separation. See Wasko and Guo (2022, Hydrological Processes) for recommendations on parameterization, as different algorithms and alpha values can produce different results. Section 4.1: "...users should not simply use recommended filter parameter values from literature in combination with any baseflow filter code without verification of their choice of filter parameter. As the digital baseflow filter is not tied to any physical realism (Nathan and McMahon, 1990) and a larger fractional baseflow may aid identification of events—even if this is not strictly baseflow as per its definition (Linsley et al., 1958)...".It is not actually necessary to do this separately, as baseflow separation is conducted in the event delineation function internally. But it is helpful to view the results and explore how different parameterizations yield different baseflow contributions.```{r}dat_wb_bf <- dat_wb %>%filter(!is.na(Yield_filled_mm)) %>%select(site_name, basin, subbasin, WaterYear, date, Yield_filled_mm, flow_mean_filled_cms, area_sqkm) %>%group_by(site_name) %>%mutate(bf =baseflowB(Yield_filled_mm, alpha = alp, passes = numpass)$bf, bfi =baseflowB(Yield_filled_mm, alpha = alp, passes = numpass)$bfi) %>%ungroup()head(dat_wb_bf)``````{r}#| fig-cap: "Total daily streamflow in yield (mm), baseflow in yield, and fractional baseflow index over the period of record for site West Brook NWIS"dat_wb_bf %>%filter(site_name =="West Brook NWIS") %>%select(date, Yield_filled_mm, bf, bfi) %>%dygraph() %>%dyRangeSelector() %>%dyAxis("y", label ="Yield (mm)") %>%dyOptions(fillGraph =TRUE, drawGrid =FALSE, axisLineWidth =1.5)```### Event identificationThere are multiple options for methods of event identification. Section 4.2 in Wasko and Guo (2022): It can be generalized that, if the aim of the analysis is to identify independent streamflow maxima then eventMaxima() and eventMinima() work well and indeed have been traditionally employed for this task. If identifying independent small events becomes difficult, or the aim is to identify wet spells, eventBaseflow() may be preferred." In our case, small events are likely important, so we use eventBaseflow() with a ~large values for BFI_Th to capture those small events.The aim is to understand how water availability affects differences in yield between Big G and Little g during hydrologic events and the intervening periods (non-event baseflow periods). Therefore, we identify events from the Big G data, apply event/non-event time periods to the little g data, then explore G-g deviations during both events and non-events as a function of #### Identify eventsSeparate Big G and Little G data```{r}dat_big <- dat_wb_bf %>%filter(site_name =="West Brook NWIS")dat_little <- dat_wb_bf %>%filter(site_name !="West Brook NWIS")```Identify events at Big G```{r}events <-eventBaseflow(dat_big$Yield_filled_mm, BFI_Th = thresh, bfi = dat_big$bfi)events <- events %>%mutate(len = end - srt +1)(events)```Plot Big G events using the default function```{r fig.height=4, fig.width = 9}#| fig-cap: "Time series of hydrologic events at Big G, identified using eventBaseflow()."par(mar = c(3,3,1,1))plotEvents(dat_big$Yield_filled_mm, events = events, main = NA, xlab = "Time-step", ylab = "Yield")```#### Tidy eventsNow add variables to the Big G time series data specifying events and non-events ```{r}# define positions of non-eventssrt <-c(1)end <-c(events$srt[1]-1)for (i in2:(dim(events)[1])) { srt[i] <- events$end[i-1]+1 end[i] <- events$srt[i]-1}nonevents <-data.frame(tibble(srt, end) %>%mutate(len = end - srt) %>%filter(len >=0) %>%select(-len) %>%add_row(srt = events$end[dim(events)[1]]+1, end =dim(dat_big)[1]))# create vectors of binary event/non-event and event IDsisevent_vec <-rep(2, times =dim(dat_big)[1])eventid_vec <-rep(NA, times =dim(dat_big)[1])for (i in1:dim(events)[1]) { isevent_vec[c(events[i,1]:events[i,2])] <-1 eventid_vec[c(events[i,1]:events[i,2])] <- i}# create vector of non-event IDsnoneventid_vec <-rep(NA, times =dim(dat_big)[1])for (i in1:dim(nonevents)[1]) { noneventid_vec[c(nonevents[i,1]:nonevents[i,2])] <- i }# create vector of "agnostic events": combined hydro events and non-eventsagnevents <-rbind(events %>%select(srt, end) %>%mutate(event =1), nonevents %>%mutate(event =0)) %>%arrange((srt))agneventid_vec <-c()for (i in1:dim(agnevents)[1]){ agneventid_vec[c(agnevents[i,1]:agnevents[i,2])] <- i }# add event/non-event vectors to Big G datadat_big <- dat_big %>%mutate(isevent = isevent_vec, eventid = eventid_vec,noneventid = noneventid_vec,agneventid = agneventid_vec,big_event_yield =ifelse(isevent_vec ==1, Yield_filled_mm, NA),big_nonevent_yield =ifelse(isevent_vec ==2, Yield_filled_mm, NA),big_event_quick = big_event_yield - bf) %>%rename(big_yield = Yield_filled_mm, big_bf = bf, big_bfi = bfi, big_flow = flow_mean_filled_cms, big_area_sqkm = area_sqkm)(dat_big)``````{r}#| fig-cap: "Time series of hydrologic events at Big G, identified using eventBaseflow()."dat_big %>%select(date, big_yield, big_bf, big_event_yield, big_nonevent_yield) %>%dygraph() %>%dyRangeSelector() %>%dyAxis("y", label ="Yield (mm)") %>%dyOptions(fillGraph =TRUE, drawGrid =FALSE, axisLineWidth =1.5)```#### Check congruencyApplying Big G event/non-event periods to little g time series data inherently assumes that event/non-event periods would be similarly delineated for little g. If this assumption does not hold, then non-event little g flow would be included in event periods, and vice-versa. How well does this assumption hold? ```{r}#| fig-cap: "Time series of yield for Big G (West Brook NWIS) and one little g site (Jimmy Brook) during hydrologic events as delineated for Big G and little g, respectively. "sites <-c("West Brook Lower", "Mitchell Brook", "Jimmy Brook", "Obear Brook Lower", "West Brook Upper", "West Brook Reservoir", "Sanderson Brook", "Avery Brook", "West Whately Brook")dat_little2 <- dat_little %>%filter(site_name =="Jimmy Brook")events_little <-eventBaseflow(dat_little2$Yield_filled_mm, BFI_Th =0.75)# define positions of non-eventssrt <-c(1)end <-c(events_little$srt[1]-1)for (i in2:(dim(events_little)[1])) { srt[i] <- events_little$end[i-1]+1 end[i] <- events_little$srt[i]-1}nonevents <-data.frame(tibble(srt, end) %>%mutate(len = end - srt) %>%filter(len >=0) %>%select(-len) %>%add_row(srt = events_little$end[dim(events_little)[1]]+1, end =dim(dat_little2)[1]))# create vectors of binary event/non-event and event IDsisevent_vec <-rep(2, times =dim(dat_little2)[1])eventid_vec <-rep(NA, times =dim(dat_little2)[1])for (i in1:dim(events_little)[1]) { isevent_vec[c(events_little[i,1]:events_little[i,2])] <-1 eventid_vec[c(events_little[i,1]:events_little[i,2])] <- i}# create vector of non-event IDsnoneventid_vec <-rep(NA, times =dim(dat_little2)[1])for (i in1:dim(nonevents)[1]) { noneventid_vec[c(nonevents[i,1]:nonevents[i,2])] <- i }# create vector of "agnostic events": combined hydro events and non-eventsagnevents <-rbind(events_little %>%select(srt, end) %>%mutate(event =1), nonevents %>%mutate(event =0)) %>%arrange((srt))agneventid_vec <-c()for (i in1:dim(agnevents)[1]){ agneventid_vec[c(agnevents[i,1]:agnevents[i,2])] <- i }# add event/non-event vectors to Big G datadat_little2 <- dat_little2 %>%mutate(isevent = isevent_vec, eventid = eventid_vec,noneventid = noneventid_vec,agneventid = agneventid_vec,little_event_yield =ifelse(isevent_vec ==1, Yield_filled_mm, NA),little_nonevent_yield =ifelse(isevent_vec ==2, Yield_filled_mm, NA),little_event_quick = little_event_yield - bf) %>%rename(little_yield = Yield_filled_mm, little_bf = bf, little_bfi = bfi)dat_big %>%select(date, big_event_yield) %>%left_join(dat_little2 %>%select(date, little_event_yield)) %>%dygraph() %>%dyRangeSelector() %>%dyAxis("y", label ="Daily yield (mm)") %>%dyOptions(fillGraph =TRUE, drawGrid =FALSE, axisLineWidth =1.5)```Whether or not events alighn between G and g is highly variable. In some cases, g events begin/end prior to G events, and in other cases g events begin/end later G events. In some cases g events are shorter than G events, and in other cases they are longer. In many cases, events are perfectly matched. Importantly, peaks in yield are almost always synchronous. Ultimately, does this matter given that we are simply using this as a method to break up our data? Furthermore, the framing of the ~entire project is that Big G is the reference by which to compare all little g's. In this sense, applying event/non-event periods derived from G to g matches this persepctive. ### Join events to Little g```{r}# fudge factor to deal with 0 flow, when loggedfudge <-0.01# join big g events to little g and summarize by event (cumulative, mean, and quantiles)dat_wb2 <- dat_little %>%filter(date >=min(dat_big$date) & date <=max(dat_big$date)) %>%left_join(dat_big %>%select(-site_name)) %>%group_by(site_name, basin, subbasin, agneventid) %>%summarise(eventlen =n(),mindate =min(date),isevent =unique(isevent), # cumulativeyield_little_cumul =sum(Yield_filled_mm+fudge),yield_big_cumul =sum(big_yield+fudge),yield_little_cumul_log =log(yield_little_cumul),yield_big_cumul_log =log(yield_big_cumul),# meanyield_little_mean =mean(Yield_filled_mm+fudge),yield_big_mean =mean(big_yield+fudge),yield_little_mean_log =mean(log(Yield_filled_mm+fudge)),yield_big_mean_log =mean(log(big_yield+fudge)),# quantilesyield_little_q10_log =quantile(log(Yield_filled_mm+fudge), probs =0.10),yield_little_q50_log =quantile(log(Yield_filled_mm+fudge), probs =0.50),yield_little_q90_log =quantile(log(Yield_filled_mm+fudge), probs =0.90),yield_big_q10_log =quantile(log(big_yield+fudge), probs =0.10),yield_big_q50_log =quantile(log(big_yield+fudge), probs =0.50),yield_big_q90_log =quantile(log(big_yield+fudge), probs =0.90)) %>%ungroup() %>%mutate(site_name =factor(site_name, levels =c("West Brook Lower", "Mitchell Brook", "Jimmy Brook", "Obear Brook Lower", "West Brook Upper", "West Brook Reservoir", "Sanderson Brook", "Avery Brook", "West Whately Brook")),site_name_cd =as.numeric(site_name)# z_yield_big_cumul_log = as.numeric(scale(yield_big_cum_log, center = TRUE, scale = TRUE)),# z_yield_big_mean_log = as.numeric(scale(yield_big_mean_log, center = TRUE, scale = TRUE)) )dat_wb2# write to filewrite_csv(dat_wb2, "C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/EcoDrought-Analysis/Event Delineation/Basin files/EcoDrought_Data_EventNonEvent_WestBrookonly.csv")```View pairs plot of summary statistics, for Big G only (limit to one little g site to avoid pseudo replication):```{r fig.width=9, fig.height=9, results='hide'}myplot <- ggpairs(dat_wb2 %>% filter(site_name == "Avery Brook") %>% select(mindate, yield_big_cumul_log, yield_big_mean_log, yield_big_q10_log, yield_big_q50_log, yield_big_q90_log))rctib <- tibble(r = c(3,4,4,5,5,5,6,6,6,6), c = c(2,2,3,2,3,4,2,3,4,5))for (i in 1:nrow(rctib)) { myplot[rctib$r[i], rctib$c[i]] <- myplot[rctib$r[i], rctib$c[i]] + geom_abline(intercept = 0, slope = 1, color = "red")}myplot```### Plot gG relationships```{r, fig.height=6, fig.width=9}#| fig-cap: "Relationship between yield at Big G and yield at little g, summarized during event/non-event periods with five different metrics."p1 <- dat_wb2 %>% ggplot(aes(x = yield_big_cumul_log, y = yield_little_cumul_log, group = site_name, color = site_name)) + geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm", se = F) + theme(legend.position="none")p2 <- dat_wb2 %>% ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = site_name, color = site_name)) + geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm", se = F) + theme(legend.position="none")p3 <- dat_wb2 %>% ggplot(aes(x = yield_big_q10_log, y = yield_little_q10_log, group = site_name, color = site_name)) + geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm", se = F) + theme(legend.position="none")p4 <- dat_wb2 %>% ggplot(aes(x = yield_big_q50_log, y = yield_little_q50_log, group = site_name, color = site_name)) + geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm", se = F) + theme(legend.position="none")p5 <- dat_wb2 %>% ggplot(aes(x = yield_big_q90_log, y = yield_little_q90_log, group = site_name, color = site_name)) + geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm", se = F) + theme(legend.position="none")ggarrange(p1, p2, p3, p4, p5, nrow = 2, ncol = 3)```View relationship between Big G and little g, color by event/non-event, facet by site. For most sites (except may Obear Brook), G-g relationships are identical between events and non-event.```{r}myplotlist <-list()# cumulative myplotlist[[1]] <- dat_wb2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_cumul_log, y = yield_little_cumul_log, group = isevent, color = isevent)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)# meanmyplotlist[[2]] <- dat_wb2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = isevent, color = isevent)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)# 10% quantilemyplotlist[[3]] <- dat_wb2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_q10_log, y = yield_little_q10_log, group = isevent, color = isevent)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)# 50% quantilemyplotlist[[4]] <- dat_wb2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_q50_log, y = yield_little_q50_log, group = isevent, color = isevent)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)# 90% quantilemyplotlist[[5]] <- dat_wb2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_q90_log, y = yield_little_q90_log, group = isevent, color = isevent)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)```::: panel-tabset##### Cumul.```{r, echo=FALSE, fig.width=8, fig.height=7}myplotlist[[1]]```##### Mean```{r, echo=FALSE, fig.width=8, fig.height=7}myplotlist[[2]]```##### Q10```{r, echo=FALSE, fig.width=8, fig.height=7}myplotlist[[3]]```##### Q50```{r, echo=FALSE, fig.width=8, fig.height=7}myplotlist[[4]]```##### Q90```{r, echo=FALSE, fig.width=8, fig.height=7}myplotlist[[5]]```:::Means, as above, but color by year. ```{r fig.width=8, fig.height=7}dat_wb2 %>% mutate(year = as.factor(year(mindate))) %>% ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = year, color = year)) + geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm") + facet_wrap(~site_name)```#### To log or not to log?Does using log-transformed data fundamentally change nature of the relationship? What does the g~G relationship look like if we use data on the original scale? Means, for example:```{r}dat_wb2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_mean, y = yield_little_mean, group = site_name, color = site_name)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F)```## Spread Creek### Trim to focal sites```{r}dat_sp <- dat %>%filter(basin =="Snake River", site_name !="Pacific Creek at Moran NWIS") %>%mutate(site_name =recode(site_name, "Leidy Creek Mouth NWIS"="Leidy Creek Mouth", "SF Spread Creek Lower NWIS"="SF Spread Creek Lower")) %>%group_by(site_name, date) %>%summarise(Yield_filled_mm =mean(Yield_filled_mm), flow_mean_filled_cms =mean(flow_mean_filled_cms), area_sqkm =mean(area_sqkm), WaterYear =unique(WaterYear)) %>%ungroup()unique(dat_sp$site_name)```View daily yield at Big G```{r}#| fig-cap: "Total daily streamflow in yield (mm), over the period of record"temp <- dat_sp %>%filter(site_name =="Spread Creek Dam")temp <-fill_missing_dates(temp, dates = date)temp %>%select(date, Yield_filled_mm) %>%dygraph() %>%dyRangeSelector() %>%dyAxis("y", label ="Daily yield (mm)") %>%dyOptions(fillGraph =TRUE, drawGrid =FALSE, axisLineWidth =1.5)```View daily yield at all sites```{r}dat_sp %>%filter(site_name !="Spread Creek Dam") %>%select(date, site_name, Yield_filled_mm) %>%spread(key = site_name, value = Yield_filled_mm) %>%left_join(dat_sp %>%filter(site_name =="Spread Creek Dam") %>%select(date, Yield_filled_mm) %>%rename(Spread_Creek_Dam = Yield_filled_mm)) %>%dygraph() %>%dyRangeSelector() %>%dyAxis("y", label ="Yield (mm)") %>%dyOptions(colors =c(hcl.colors(9, "Zissou 1"), "black")) ```### Set parametersSet baseflow separation (for the Lyne-Hollick one-parameter digital recursive filter) and event delineation paramters (as in Wasko and Guo, 2022)* *alp*: alpha filter parameter, higher values "lower" the estimated baseflow (thus making it difficult to delineate events)* *numpass*: number of passes. Ladson *et al*. (2013) recommend 3 passes for daily data (default in baseflowB() function)* *thresh*: baseflow index threshold for event delineation, higher threshold values make it "easier" to delineate events```{r}alp <-0.925numpass <-3thresh <-0.85```### Baseflow separationPerform baseflow separation. See Wasko and Guo (2022, Hydrological Processes) for recommendations on parameterization, as different algorithms and alpha values can produce different results. Section 4.1: "...users should not simply use recommended filter parameter values from literature in combination with any baseflow filter code without verification of their choice of filter parameter. As the digital baseflow filter is not tied to any physical realism (Nathan and McMahon, 1990) and a larger fractional baseflow may aid identification of events—even if this is not strictly baseflow as per its definition (Linsley et al., 1958)...".It is not actually necessary to do this separately, as baseflow separation is conducted in the event delineation function internally. But it is helpful to view the results and explore how different parameterizations yield different baseflow contributions.```{r}yrs <-unlist(unique(dat_sp %>%filter(site_name =="Spread Creek Dam", !is.na(Yield_filled_mm)) %>%select(WaterYear)))datlist <-list()for(i in1:length(yrs)) { datlist[[i]] <- dat_sp %>%filter(site_name =="Spread Creek Dam", WaterYear == yrs[i], !is.na(Yield_filled_mm)) %>%mutate(bf =baseflowB(Yield_filled_mm, alpha = alp, passes = numpass)$bf, bfi =baseflowB(Yield_filled_mm, alpha = alp, passes = numpass)$bfi)}dat_sp_bf <-do.call(rbind, datlist)``````{r}#| fig-cap: "Total daily streamflow in yield (mm), baseflow in yield, and fractional baseflow index over the period of record for site Spread Creek Dam (Big G)"fill_missing_dates(dat_sp_bf, dates = date) %>%select(date, Yield_filled_mm, bf, bfi) %>%dygraph() %>%dyRangeSelector() %>%dyAxis("y", label ="Yield (mm)") %>%dyOptions(fillGraph =TRUE, drawGrid =FALSE, axisLineWidth =1.5)```### Event identification#### Identify eventsSeparate Big G and Little G data```{r}dat_big <- dat_sp_bf %>%filter(site_name =="Spread Creek Dam")dat_little <- dat_sp %>%filter(site_name !="Spread Creek Dam")```Identify events at Big G and tidy...loop over years, i.e., chunks of data```{r}yrs <-unlist(unique(dat_sp %>%filter(site_name =="Spread Creek Dam", !is.na(Yield_filled_mm)) %>%select(WaterYear)))biglist <-list()for (k in1:length(yrs)) { ddd <- dat_sp_bf %>%filter(WaterYear == yrs[k]) events <-eventBaseflow(ddd$Yield_filled_mm, BFI_Th = thresh, bfi = ddd$bfi) events <- events %>%mutate(len = end - srt +1)# define positions of non-events srt <-c(1) end <-c(events$srt[1]-1)for (i in2:(dim(events)[1])) { srt[i] <- events$end[i-1]+1 end[i] <- events$srt[i]-1 } nonevents <-tibble(srt, end) %>%mutate(len = end - srt) %>%filter(len >=0) %>%select(-len)if(events$end[dim(events)[1]] !=dim(ddd)[1]) { nonevents <- nonevents %>%add_row(srt = events$end[dim(events)[1]]+1, end =dim(ddd)[1]) } nonevents <-data.frame(nonevents)# create vectors of binary event/non-event and event IDs isevent_vec <-rep(2, times =dim(ddd)[1]) eventid_vec <-rep(NA, times =dim(ddd)[1])for (i in1:dim(events)[1]) { isevent_vec[c(events[i,1]:events[i,2])] <-1 eventid_vec[c(events[i,1]:events[i,2])] <- i }# create vector of non-event IDs noneventid_vec <-rep(NA, times =dim(ddd)[1])for (i in1:dim(nonevents)[1]) { noneventid_vec[c(nonevents[i,1]:nonevents[i,2])] <- i }# create vector of "agnostic events": combined hydro events and non-events agnevents <-rbind(events %>%select(srt, end) %>%mutate(event =1), nonevents %>%mutate(event =0)) %>%arrange((srt)) agneventid_vec <-c()for (i in1:dim(agnevents)[1]){ agneventid_vec[c(agnevents[i,1]:agnevents[i,2])] <- i }# add event/non-event vectors to Big G data biglist[[k]] <- ddd %>%mutate(isevent = isevent_vec, eventid = eventid_vec,noneventid = noneventid_vec,agneventid = agneventid_vec,big_event_yield =ifelse(isevent_vec ==1, Yield_filled_mm, NA),big_nonevent_yield =ifelse(isevent_vec ==2, Yield_filled_mm, NA),big_event_quick = big_event_yield - bf) %>%mutate(agneventid =paste(yrs[k], "_", agneventid, sep ="")) %>%rename(big_yield = Yield_filled_mm, big_bf = bf, big_bfi = bfi, big_flow = flow_mean_filled_cms, big_area_sqkm = area_sqkm)}dat_big <-do.call(rbind, biglist)dat_big``````{r}#| fig-cap: "Time series of hydrologic events at Big G, identified using eventBaseflow()."fill_missing_dates(dat_big, dates = date) %>%select(date, big_yield, big_bf, big_event_yield, big_nonevent_yield) %>%dygraph() %>%dyRangeSelector() %>%dyAxis("y", label ="Yield (mm)") %>%dyOptions(fillGraph =TRUE, drawGrid =FALSE, axisLineWidth =1.5)```### Join events to Little g```{r}# fudge factor to deal with 0 flow, when loggedfudge <-0.01# join big g events to little g and summarize by event (cumulative, mean, and quantiles)dat_sp2 <- dat_little %>%filter(date >=min(dat_big$date) & date <=max(dat_big$date)) %>%left_join(dat_big %>%select(-site_name)) %>%group_by(site_name, agneventid) %>%summarise(eventlen =n(),mindate =min(date),isevent =unique(isevent), n_little =sum(!is.na(Yield_filled_mm)),n_big =sum(!is.na(big_yield)),# cumulativeyield_little_cumul =sum(Yield_filled_mm+fudge, na.rm =TRUE),yield_big_cumul =sum(big_yield+fudge, na.rm =TRUE),yield_little_cumul_log =log(yield_little_cumul),yield_big_cumul_log =log(yield_big_cumul),# meanyield_little_mean =mean(Yield_filled_mm+fudge),yield_big_mean =mean(big_yield+fudge),yield_little_mean_log =mean(log(Yield_filled_mm+fudge), na.rm =TRUE),yield_big_mean_log =mean(log(big_yield+fudge), na.rm =TRUE),# quantilesyield_little_q10_log =quantile(log(Yield_filled_mm+fudge), probs =0.10, na.rm =TRUE),yield_little_q50_log =quantile(log(Yield_filled_mm+fudge), probs =0.50, na.rm =TRUE),yield_little_q90_log =quantile(log(Yield_filled_mm+fudge), probs =0.90, na.rm =TRUE),yield_big_q10_log =quantile(log(big_yield+fudge), probs =0.10, na.rm =TRUE),yield_big_q50_log =quantile(log(big_yield+fudge), probs =0.50, na.rm =TRUE),yield_big_q90_log =quantile(log(big_yield+fudge), probs =0.90, na.rm =TRUE)) %>%ungroup() %>%mutate(site_name =factor(site_name, levels =c("Rock Creek", "SF Spread Creek Lower", "Grouse Creek", "SF Spread Creek Upper", "Leidy Creek Mouth", "Leidy Creek Upper", "NF Spread Creek Lower", "NF Spread Creek Upper", "Grizzly Creek")),site_name_cd =as.numeric(site_name)# z_yield_big_cumul_log = as.numeric(scale(yield_big_cum_log, center = TRUE, scale = TRUE)),# z_yield_big_mean_log = as.numeric(scale(yield_big_mean_log, center = TRUE, scale = TRUE)) ) %>%filter(!is.na(agneventid))dat_sp2 <- dat_sp2[dat_sp2$n_little == dat_sp2$n_big,]dat_sp2# write to filewrite_csv(dat_sp2, "C:/Users/jbaldock/OneDrive - DOI/Documents/USGS/EcoDrought/EcoDrought Working/EcoDrought-Analysis/Event Delineation/Basin files/EcoDrought_Data_EventNonEvent_SpreadCreekonly.csv")```View pairs plot of summary statistics, for Big G only (limit to one little g site to avoid pseudo replication):```{r fig.width=9, fig.height=9, results='hide'}myplot <- ggpairs(dat_sp2 %>% select(mindate, yield_big_cumul_log, yield_big_mean_log, yield_big_q10_log, yield_big_q50_log, yield_big_q90_log))rctib <- tibble(r = c(3,4,4,5,5,5,6,6,6,6), c = c(2,2,3,2,3,4,2,3,4,5))for (i in 1:nrow(rctib)) { myplot[rctib$r[i], rctib$c[i]] <- myplot[rctib$r[i], rctib$c[i]] + geom_abline(intercept = 0, slope = 1, color = "red")}myplot```### Plot gG relationships```{r, fig.height=6, fig.width=9}#| fig-cap: "Relationship between yield at Big G and yield at little g, summarized during event/non-event periods with five different metrics."p1 <- dat_sp2 %>% ggplot(aes(x = yield_big_cumul_log, y = yield_little_cumul_log, group = site_name, color = site_name)) + geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm", se = F) + theme(legend.position="none")p2 <- dat_sp2 %>% ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = site_name, color = site_name)) + geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm", se = F) + theme(legend.position="none")p3 <- dat_sp2 %>% ggplot(aes(x = yield_big_q10_log, y = yield_little_q10_log, group = site_name, color = site_name)) + geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm", se = F) + theme(legend.position="none")p4 <- dat_sp2 %>% ggplot(aes(x = yield_big_q50_log, y = yield_little_q50_log, group = site_name, color = site_name)) + geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm", se = F) + theme(legend.position="none")p5 <- dat_sp2 %>% ggplot(aes(x = yield_big_q90_log, y = yield_little_q90_log, group = site_name, color = site_name)) + geom_point(alpha = 0.25) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm", se = F) + theme(legend.position="none")ggarrange(p1, p2, p3, p4, p5, nrow = 2, ncol = 3)```View relationship between Big G and little g, color by event/non-event, facet by site. For most sites (except may Obear Brook), G-g relationships are identical between events and non-event.```{r}myplotlist <-list()# cumulative myplotlist[[1]] <- dat_sp2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_cumul_log, y = yield_little_cumul_log, group = isevent, color = isevent)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)# meanmyplotlist[[2]] <- dat_sp2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = isevent, color = isevent)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)# 10% quantilemyplotlist[[3]] <- dat_sp2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_q10_log, y = yield_little_q10_log, group = isevent, color = isevent)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)# 50% quantilemyplotlist[[4]] <- dat_sp2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_q50_log, y = yield_little_q50_log, group = isevent, color = isevent)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)# 90% quantilemyplotlist[[5]] <- dat_sp2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_q90_log, y = yield_little_q90_log, group = isevent, color = isevent)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm") +facet_wrap(~site_name)```::: panel-tabset#### Cumul.```{r, echo=FALSE, fig.width=8, fig.height=7}myplotlist[[1]]```#### Mean```{r, echo=FALSE, fig.width=8, fig.height=7}myplotlist[[2]]```#### Q10```{r, echo=FALSE, fig.width=8, fig.height=7}myplotlist[[3]]```#### Q50```{r, echo=FALSE, fig.width=8, fig.height=7}myplotlist[[4]]```#### Q90```{r, echo=FALSE, fig.width=8, fig.height=7}myplotlist[[5]]```:::Means, as above, but color by year. ```{r fig.width=8, fig.height=7}dat_sp2 %>% mutate(year = as.factor(year(mindate))) %>% ggplot(aes(x = yield_big_mean_log, y = yield_little_mean_log, group = year, color = year)) + geom_point(alpha = 0.5) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + geom_smooth(method = "lm") + facet_wrap(~site_name)```#### To log or not to log?Does using log-transformed data fundamentally change nature of the relationship? What does the g~G relationship look like if we use data on the original scale? Means, for example:```{r}dat_sp2 %>%mutate(isevent = dplyr::recode(isevent, "1"="Event", "2"="Baseflow")) %>%ggplot(aes(x = yield_big_mean, y = yield_little_mean, group = site_name, color = site_name)) +geom_point(alpha =0.5) +geom_abline(intercept =0, slope =1, linetype ="dashed") +geom_smooth(method ="lm", se = F)```## DEPRECATED## Sensitivity analyses### Missing dataMany of the EcoDrought time series data are incomplete. At some sites, discharge data is available only during the summer and/or fall periods, and at other sites, time series data are interrupted due to malfunctioning sensors and/or ice formation ("ice spikes"). So how does the length of the time series affect baseflow separation (and subsequent event identification)? Wasko and Guo (2022) use a 67 day time series of flow to demonstrate the utility of the *hydroEvents* packages, suggesting digital baseflow separation techniques may be valid for relatively short time series. Here, I perform a simple sensitivity analysis to explore the effect of time series length on the results of baseflow separation. Essentially, perform baseflow separation on increasingly smaller subsets of the data. With the default parameters, the minimum number of days/observations needed is 31. This is because the default number of points reflected at start and end of data (r) is 30. Reflection allows bf/bfi to be calculated over the entire period of record as the underlying baseflow separation equations result in "issues of "Warm-up" and "cool-down" as the recursive filter is moved forward and backward over the dataset" (Ladson et al. 2013, Australian Journal of Water Resources). *baseflowB()* uses a default reflection period of 30, which Ladson et al. (2013) found to "provide a realistic baselfow response for the start and end of the actual flow data".```{r eval=FALSE, include=FALSE}dat_wb_sens <- dat_wb %>% mutate(bf_c = baseflowB(Yield_filled_mm)$bf, bfi_c = baseflowB(Yield_filled_mm)$bfi) %>% select(date, bf_c, bfi_c) %>% left_join(dat_wb[1:(365*3),] %>% mutate(bf_3y = baseflowB(Yield_filled_mm)$bf, bfi_3y = baseflowB(Yield_filled_mm)$bfi) %>% select(date, bf_3y, bfi_3y)) %>% left_join(dat_wb[1:(365),] %>% mutate(bf_1y = baseflowB(Yield_filled_mm)$bf, bfi_1y = baseflowB(Yield_filled_mm)$bfi) %>% select(date, bf_1y, bfi_1y)) %>% left_join(dat_wb[1:(182),] %>% mutate(bf_6m = baseflowB(Yield_filled_mm)$bf, bfi_6m = baseflowB(Yield_filled_mm)$bfi) %>% select(date, bf_6m, bfi_6m)) %>% left_join(dat_wb[1:(90),] %>% mutate(bf_3m = baseflowB(Yield_filled_mm)$bf, bfi_3m = baseflowB(Yield_filled_mm)$bfi) %>% select(date, bf_3m, bfi_3m)) %>% left_join(dat_wb[1:(35),] %>% mutate(bf_1m = baseflowB(Yield_filled_mm)$bf, bfi_1m = baseflowB(Yield_filled_mm)$bfi) %>% select(date, bf_1m, bfi_1m))```#### Compare baseflowDivergence in baseflow among datasets is a result of the reflected data of the shorter dataset not matching the actual data of the longer dataset. As a result, divergence really only occurs at the end of each time series and is generally small in magnitude. ```{r eval=FALSE, include=FALSE}#| fig-cap: "Time series of baseflow derived from datasets of different lengths."dat_wb_sens %>% select(date, bf_c, bf_3y, bf_1y, bf_6m, bf_3m, bf_1m) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Daily baseflow in yield (mm)") %>% dyOptions(fillGraph = TRUE, drawGrid = FALSE, axisLineWidth = 1.5)``````{r message = FALSE, warning = FALSE, eval=FALSE, include=FALSE}#| fig-cap: "Pairs plot of baseflow derrived from datasets of different lengths. Red lines are 1:1."ggpairs(dat_wb_sens %>% select(bf_c, bf_3y, bf_1y, bf_6m, bf_3m, bf_1m)) + geom_abline(intercept = 0, slope = 1, color = "red")```#### Compare baseflow indexThe story here is essentially the same as above: divergence is ~minimal and restricted to the end of each time series. However, we note that divergence in BFI appears to increase as absolute flow/baseflow decreases, because small differences in absolute space become much larger in relative space when absolute values are small. ```{r eval=FALSE, include=FALSE}#| fig-cap: "Time series of baseflow index derived from datasets of different lengths."dat_wb_sens %>% select(date, bfi_c, bfi_3y, bfi_1y, bfi_6m, bfi_3m, bfi_1m) %>% dygraph() %>% dyRangeSelector() %>% dyAxis("y", label = "Daily baseflow in yield (mm)") %>% dyOptions(fillGraph = TRUE, drawGrid = FALSE, axisLineWidth = 1.5)``````{r message = FALSE, warning = FALSE, eval=FALSE, include=FALSE}#| fig-cap: "Pairs plot of baseflow derrived from datasets of different lengths. Red lines are 1:1."ggpairs(dat_wb_sens %>% select(bf_c, bf_3y, bf_1y, bf_6m, bf_3m, bf_1m)) + geom_abline(intercept = 0, slope = 1, color = "red")```View relationship between Big G and among-site/event-specific standard deviation in little g.```{r eval=FALSE, include=FALSE}#| fig-cap: "Derived from log cumulative yield"dat_wb2 %>% select(agneventid, yield_big_cum_log, yield_little_cum_log) %>% group_by(agneventid) %>% summarize(unq_big = unique(yield_big_cum_log), sd_little = sd(yield_little_cum_log)) %>% ggplot(aes(x = unq_big, y = sd_little)) + geom_point() + geom_smooth() + ylim(0,1.5)``````{r eval=FALSE, include=FALSE}#| fig-cap: "Derived from mean log yield"dat_wb2 %>% select(agneventid, yield_big_mean_log, yield_little_mean_log) %>% group_by(agneventid) %>% summarize(unq_big = unique(yield_big_mean_log), sd_little = sd(yield_little_mean_log)) %>% ggplot(aes(x = unq_big, y = sd_little)) + geom_point() + geom_smooth() + ylim(0,1.5)```